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OPTIMAL RATES OF CONVERGENCE FOR ESTIMATING THE 
NULL DENSITY AND PROPORTION OF NONNULL EFFECTS IN 
LARGE-SCALE MULTIPLE TESTING 

By T. Tony Cai 1 and Jiashun Jin 2 

University of Pennsylvania and Carnegie Mellon University 

An important estimation problem that is closely related to large- 
scale multiple testing is that of estimating the null density and the 
proportion of nonnull effects. A few estimators have been introduced 
in the literature; however, several important problems, including the 
evaluation of the minimax rate of convergence and the construction 
of rate-optimal estimators, remain open. 

In this paper, we consider optimal estimation of the null density 
and the proportion of nonnull effects. Both minimax lower and upper 
bounds are derived. The lower bound is established by a two-point 
testing argument, where at the core is the novel construction of two 
least favorable marginal densities /i and ji. The density /i is heavy 
tailed both in the spatial and frequency domains and /2 is a pertur- 
bation of /i such that the characteristic functions associated with 
/i and /2 match each other in low frequencies. The minimax upper 
bound is obtained by constructing estimators which rely on the em- 
pirical characteristic function and Fourier analysis. The estimator is 
shown to be minimax rate optimal. 

Compared to existing methods in the literature, the proposed pro- 
cedure not only provides more precise estimates of the null density 
and the proportion of the nonnull effects, but also yields more accu- 
rate results when used inside some multiple testing procedures which 
aim at controlling the False Discovery Rate (FDR). The procedure is 
easy to implement and numerical results are given. 

1. Introduction. Large-scale multiple testing is an important area in 
modern statistics with a wide range of applications including DNA microar- 
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ray studies, functional Magnetic Resonance Imaging analyses (fMRI) and 
astronomical surveys. Since the seminal paper by Benjamini and Hochberg 
(1995) on false discovery rate (FDR) control, research in this area has been 
very active. See, for example, Efron et al. (2001), Storey (2002), Genovese 
and Wasserman (2004), van der Laan, Dudoit and Pollard (2004) and Sun 
and Cai (2007). Properties of FDR-controlling procedures have been stud- 
ied, for example, in Finner, Dickhaus and Roters (2009) and Neuvial (2008). 
See also Abramovich et al. (2006) and Donoho and Jin (2006) for estimation 
using a multiple testing approach. 

In large-scale multiple testing, one tests simultaneously a large number of 
null hypotheses 

(1.1) Hi,H2,...,H n . 

Frequently, associated with each hypothesis Hj is a test statistic Xj, which 
can be a z-score, a p- value, a summary statistic, etc., depending on the 
situation. The goal is to use the test statistics to determine which hypotheses 
are true and which are false. We call Xj a null effect if Hj is true and a 
nonnull effect otherwise. 

A commonly used and effective framework for large-scale multiple testing 
is the so-called two-group random mixture model which assumes that each 
hypothesis has a given probability of being true and the test statistics are 
generated from a mixture of two densities; see, for example, Efron et al. 

(2001) , Newton et al. (2001), Storey (2002) and Sun and Cai (2007). In 
detail, let 9 = [9\, . . . , 9 n ) be independent Bernoulli(e) variables, where e € 
(0, 1) and 9j = indicates that the null hypothesis Hj is true and 9j = 1 
otherwise. When 9j = 0, Xj is generated from a density / null (2;). When 9j = 
1, Xj is generated from another (alternative) density / alt (x). Marginally, Xj 
obeys the following two-group random mixture model: 

(1.2) X 3 ^ (1 - e)r 11 + er lt = f, j = l,..., n, 

where / nul1 , / alt and e are called the null density, nonnull density and pro- 
portion of nonnull effects, respectively. 

An important estimation problem that is closely related to multiple test- 
ing is that of estimating / nul1 , e and /. In fact, many commonly used multi- 
ple testing procedures require good estimators of some or all of these three 
quantities. See Benjamini and Hochberg (2000), Efron et al. (2001), Storey 

(2002) , Genovese and Wasserman (2004), Benjamini, Krieger and Yekutieli 
(2006), Blanchard and Roquain (2007) and Sun and Cai (2007). For exam- 
ple, in an empirical Bayes framework, Efron et al. (2001) introduced the 
local false discovery rate (Lfdr) which is defined as 

(1.3) L6M , ) =(1Z^W. 

fix) 
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Lfdr has a useful Bayesian interpretation as the a posteriori probability of a 
hypothesis being in the null group given the value of the test statistic. See 
also Miiller et al. (2004). Sun and Cai (2007) considered the multiple testing 
problem from a compound decision theoretical point of view and showed 
that the Lfdr is a fundamental quantity which can be used directly for the 
optimal FDR control. Calculating the Lfdr clearly requires the knowledge 
of e, y nu11 and /. In real applications, the proportion e and the marginal 
density / are unknown and thus need to be estimated from the data. The 
null density / nul1 is more subtle. In many studies the null distribution is 
assumed to be known and can be used directly for multiple testing. How- 
ever, somewhat surprisingly, Efron (2004) demonstrated convincingly that 
in some applications such as the analysis of microarray data on breast cancer 
and human immunodeficiency virus (HIV) the true null distribution of the 
test statistic can be quite different from the theoretical null, and possible 
causes for such a phenomenon include but are not limited to unobserved 
covariates, correlations across different arrays and different genes. It is fur- 
ther illustrated in Jin and Cai (2007) that two seemingly close choices of the 
null distribution can lead to substantially different testing results. Hence, a 
careful study on how to estimate the null distribution is also indispensable. 

In the present paper we study the problem of optimal estimation of the 
null density J nu11 and the proportion e. We should mention that estimat- 
ing the marginal density / is a standard density estimation problem and is 
well understood. See, for example, Silverman (1986). Several methods for 
estimating the null density / nul1 and the proportion e have been introduced 
in the literature. See Efron (2004, 2008) and Jin and Cai (2007) for esti- 
mating f nnl1 and e, and see Genovese and Wasserman (2004), Meinshausen 
and Rice (2006), Cai, Jin and Low (2007), Jin (2008) and Celisse and Robin 
(2008) for estimating e [also see Storey (2002), Efron et al. (2001), Swanepoel 
(1999)]. Unfortunately, despite the encouraging progress in these works, the 
optimality of the estimators is largely unknown [it is, however, not hard to 
show that some of these estimators are generally inconsistent in the non- 
sparse case; see, e.g., Jin and Cai (2007)]. It is hence of significant interest 
to understand how well f nn ^ and e can be estimated and to what extend 
improving the estimation accuracy of / nul1 and e can help to enhance the 
performance of leading contemporary multiple testing procedures [including 
but not limited to those by Benjamini and Hochberg (1995), Efron et al. 
(2001) and Sun and Cai (2007)]. Multiple testing procedures that adapt to 
e, without estimating it directly, have also been proposed recently in Blan- 
chard and Roquain (2007) and Finner, Dickhaus and Roters (2009). 

In this paper, we focus on the Gaussian mixture model as in Efron (2004). 
We model / nul1 as Gaussian, but both the mean and the variance are un- 
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known and need to be estimated: 

(1.4) / nuU (x) = — <P ( ?—^°) 0: density of N(0, 1) . 

We shall use the terminology in Efron (2004) by calling cJq the null variance 
parameter, uq the null mean parameter, and together the null parameters. 
The Gaussian model for y nu11 is somewhat idealized, but it is a reasonable 
choice. On one hand, assuming J nu11 as Gaussian helps to re-normalize the 
null distribution and is therefore a good starting point in large-scale multiple 
testing. On the other hand, allowing f nnl1 to be in a much broader class will 
lead to identifiability problems. The nonnull distribution / alt is modeled by 
a Gaussian location-scale mixture, 

(1.5) r lt (x) = ff ^(^) dH(u,a), 

where H is called the mixing distribution. Additional to the mathematical 
tractability that it offers, model (1.5) also offers great flexibility. For exam- 
ple, it is well known that under the L 1 -metric, the set of Gaussian mixing 
densities of the form in (1.5) is dense in the set of all density functions. Also, 
model (1.5) is able to capture the essence of many application examples. See 
Jin (2008) for an example on the analysis of gene microarray data on breast 
cancer and an example on the study of the abundance of the Kuiper Belt 
Objects. 

We consider the asymptotic minimax estimation problem and address 
several inter-connected questions: what are the optimal rates of convergence? 
what are the best estimation tools, and where do the difficulties of the 
estimation problem come from? Our analysis reveals that the optimal rates 
of convergence for estimating the proportion and the null parameters depend 
on the smoothness of H{u, a) (more specifically, the conditional density of u 
given a associated with H). For an intuitive explanation, we note that / nul1 
and / alt are the convolution of the standard Gaussian with the point mass 
concentrated at (uo,ao) and H, respectively. Therefore, the smoother H is, 
the more "different" it is from a point mass, and the less similar that / nul1 
and / alt are. Consequently, it is easier to separate one from the other, and 
hence a faster convergence rate in estimating the proportion and the null 
parameters. 

Since the smoothness of a density can be conveniently characterized by 
the tail behavior of its characteristic function, this suggests that frequency 
domain techniques can be naturally used for studying the optimal rate of 
convergence. Along this line, we first derive a minimax lower bound by 
a careful analysis of the tail behavior of the characteristic functions and 
by a two-point testing technique. We then establish the upper bound by 
constructing estimators with the risks converging to zero at the same rate 
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as that of the lower bound — such estimators are then rate optimal. The 
procedures are closely related to our recent work Jin and Cai (2007) and Jin 
(2008) which to the best of our knowledge are the only frequency-domain- 
based approach to estimating the null parameters and the proportion of 
nonnull effects. We should emphasize that the upper bound does not follow 
trivially from that in Jin and Cai (2007) and Jin (2008). For example, it 
is seen that the procedure for estimating the proportion proposed in Jin 
and Cai (2007) and Jin (2008) is not optimal, and careful modifications are 
needed to make it optimal. Also, to prove the optimality of the procedures 
here, we need much more delicate analysis than that in Jin and Cai (2007) 
and Jin (2008), where the scope of the study is limited to the consistency of 
the procedures. 

In addition to the asymptotic analysis, we also investigate the finite n 
performance of the estimators using simulated data. The proposed proce- 
dures are easy to implement. The goal for the simulation study is two- fold: 
how accurate the parameters are estimated and how the errors in the point 
estimation affect the results of the subsequent multiple testing. The numer- 
ical study shows that our estimators enjoy superior performance both in 
parameter estimation (measured by mean squared errors) and in the subse- 
quent multiple testing. Our estimator of the proportion performs well uni- 
formly in all the cases in comparison to the estimators proposed in Storey 
(2002) and Efron (2004). In particular, it is robust under many different 
choices of nonnull distribution and sparsity level. The multiple testing re- 
sults are generally sensitive to the changes in the null parameters as well as 
the proportion. In our numerical study, we compare the performance of our 
estimators with those of Storey (2002) and Efron (2004) using two specific 
multiple testing procedures, the adaptive p-value based procedure of Ben- 
jamini and Hochberg (2000) which requires estimation of the proportion e, 
and the AdaptZ procedure of Sun and Cai (2007) which requires estimation 
of e, f and / nul1 . The simulation study shows that our estimators yield the 
most accurate multiple testing results in both cases in comparison to the 
other two estimators. 

The paper is organized as follows. In Section 2, after basic notation and 
definitions are introduced, we consider the minimax lower bound for estimat- 
ing the null parameters. We then derive the minimax rates of convergence 
by showing that the lower bound is in fact sharp. This is accomplished by 
constructing rate-optimal estimators using the empirical characteristic func- 
tions. Section 3 studies the minimax estimation of the proportion. We first 
consider the simpler case where the null parameters are given and then ex- 
tend the result to the case where the null parameters are unknown. Section 
4 investigates the numerical performance of our procedure by a simulation 
study. Section 5 discusses possible extensions of our work and its connec- 
tions with the nonparametric deconvolution problem. The proofs of the main 
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results are given in Section 6 and the Appendix contains the proofs of the 
technical lemmas that are used to prove the main results. 

2. Estimating the null parameters: Minimax risk and rate optimal es- 
timators. In this section, we study the minimax risks for estimating the 
null parameters. The minimax lower bounds are established by a two-point 
testing argument in Section 2.1. At the core of the argument is the con- 
struction of two underlying densities whose corresponding null parameters 
are different but whose characteristic functions match with each other in 
low frequencies. We then derive the minimax upper bounds by constructing 
and studying rate optimal estimators in Section 2.2. 

Return to the Gaussian mixture model 

(2.1) Xj ^ (1 -e)±J^) +e f -J—) dH(u,a) ee /(*). 

For any mixing distribution H(u,a) under consideration, let H{o) be the 
marginal distribution of a and let H(u\a) be the conditional distribution of 
u given a. 

Definition 2.1. We call a density / eligible if it has the form as in (2.1) 
where H(u,a) satisfies that H(a) is supported on [ao,oo) and that H(u\a) 
has a density h(u\a) for any a > ctq. We denote the set of all eligible / by 
F. 

Two examples for eligible / are (1). H(cr) is supported in [<7o + S,oo) for 
some constant 5 > 0, and (2). H(a) is the point mass at ero, and H(u\ctq) 
has a density. 

In this paper, we focus on eligible /, so that the null parameters and the 
proportion of nonnull effects are both identifiable. See Jin and Cai (2007) 
for more discussion on identifiability. 

We shall define the parameter space of / for the minimax theory. First, 
we suppose that for some fixed constant q > and A > a > 0, 

(2.2) a >a, J \x\ 9 f (x) dx < A q , 

so that <7q and uq are uniformly bounded across the whole parameter space. 
Second, fix a > 0. We assume 

(2.3) lim sup {\t\ a \h(t\a)\} < A, En sup {\t\ a+1 \~ti {t\a)\} < A, 

where h(u\a) is the aforementioned conditional density, h(t\a) is the corre- 
sponding characteristic function and 

(2.4) h(t\a) = h(t\a;u )= e itu h(u + u \a) du. 
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Roughly speaking, (2.3) requires h(u\a) to be sufficiently smooth so that 
h(t\a) decays at a rate not slower than that of \t\~ a . We shall see below that 
the minimax risk depends on the smoothness parameter a. Note that in (2.2) 
and (2.3), different constants A can be used in different places. However, this 
does not change the minimax rate of convergence, so we use the same A for 
simplicity. 

Last, we calibrate the proportion e. In the literature, the proportion is a 
well-known measure for sparsity; see, for example, Abramovich et al. (2006) 
and Jin and Cai (2007). In this paper, we focus on the moderately sparse 
case where the proportion e = e n can be small but not smaller than 1/y/n. 
The case e n <C is called the very sparse case and has been proven 

to be much more challenging for statistical inference; see Donoho and Jin 
(2004) and Cai, Jin and Low (2007) for detailed discussion. In light of this, 
we suppose that for some fixed parameters £o G (0, 1) and /? G [0, 1/2), 

(2.5) e n <i] n where r] n = r] n (eo, (3) = e n- P. 

Note that rj n = eq when (3 = 0. For this reason, we require £o < 1 so that the 
null component will not be vanishingly small. 

In summary, the parameter space we consider for the minimax risk is 

JFq =J r o(a,P,£ ,q,a,A;n) 

(2.6) 

= {/ € T and satisfies (2.2), (2.3) and (2.5)}. 

We measure the performance of an estimator for the null parameters by 
mean squared errors, and measure the level of difficulty for the problem 
of estimating the null parameters a\ and uq by the minimax risks defined, 
respectively, by 

RZ = R°(T (a,p,£ ,q,a,A;n))=mf( sup E[a 2 - a 2 Q ] 2 \ 

and 

B% = R^(JFq (a, P, s ,q, a, A; n)) = inf{ sup E[u - u } 2 \. 

2.1. Lower bound for the minimax risk. In this section, we establish the 
lower bound for the minimax risk of estimating o"q and Uq. As the discussions 
are similar, we shall focus on that for Oq. We use the well-known two-point 
testing argument to show the lower bound [see, e.g., Ibragimov, Nemirovskii 
and Khas'minskii (1986) and Donoho and Liu (1991)], where the key is to 
construct two density functions in Tq — fi(x) and fzix) — such that the null 
variance parameters associated with them differ by a small amount, say S n , 
but two densities are indistinguishable in the sense that their x 2 -distance 

(2.7) dUl J 2)s [(JM^)l dx 

J fi(x) 
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is of a smaller order than that of 1/n. In fact, once such densities fa and fa 
are constructed, then there is a constant C > such that 

(2.8) K > Cb\ 

and Cb\ is a lower bound for the minimax risk; see Ibragimov, Nemirovskii 
and Khas'minskii (1986) and Donoho and Liu (1991) for details. 
To this end, let 

2 2 i x 

a n = a +d n , 

where 5 n > to be determined. Our construction of fa and fa has the form 
of 

(2.9) fa(x) = (1 - Vn)U{^j +VnJl<t>(^-)hi(u)du, 

f x \ /*] fx — XI \ 

(2.10) fa(x) = (l- Vn ) — 4> — +7? n / — /i 2 (u) du, 

where a and r\ n are as in the definition of To (a, /3, eo, q, a, A; n), h\(u) and 
h,2{u) are two density functions to be determined (note that the null variance 
parameters associated with f\ and fa differ by an amount of 8 n ). There are 
two key elements in our construction. First, the characteristic functions of 
/1 and fa match with each other in low frequencies, that is, for a constant 
r = r n to be determined, 

(2.11) h(t) = fa(t) y\t\<r n . 

Second, fa is heavy-tailed in the spatial domain, 

(2.12) fa(x) > Crj n (l + \x\)~ k Vx, 

where k > is an integer to be determined. Below, we first show that the 
X 2 -distance between fa and fa equals to o(l/n) if we take the r n in (2.11) 
to be 

(2.13) r„ = iV31ogn. 

We then sketch how to construct fa and fa to satisfy (2.11) and (2.12), and 
discuss how large 6 n could be so that such a construction is possible. We 
conclude this subsection with the statement for the minimax lower bounds. 
To focus on the main ideas, we try to be simple and heuristic in this section 
and leave proof details to Section 6. 

We now begin by investigating the x 2_ distance. First, the heavy-tailed 
property of fa largely simplifies the calculation of the x 2 -distance. In fact, by 
(2.12) and the well-known Parseval formula [Mallat (1998)], the x 2_ distance 
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is proportional to the L 2 -distance in the spatial domain, and so the L 2 - 
distance in the frequency domain, 

d(/ 1 ,/ 2 )<Clog fc / 2 (n)r/- 1 j{f 2 {x)-h{x)fdx 

= C\og k /\n)r ] - 1 j \h{t) - h{t)f dt. 

See Section 6 for the proof. Moreover, since that /i and / 2 match each other 
in low frequencies, and that \ fjit)\ < Ce~ a 1 I 2 for j = 1,2, 

f Chit) -hit)? dt= f ih(t)-hit)?dt<C f e~ a2t2 / 2 dt. 

J J\t\>T„ J\t\>T n 

Putting these together, 

(2.14) d(h,f 2 ) < Clog fc / 2 (n)^ 1 e- aV "/ 2 = Cr / - 1 log fc / 2 (n)n- 3 / 2 . 

Since % S> l/\/n, this show that the x 2 -distance d(/i,/2) = o(l/n). 

Next, we sketch the idea for constructing f\ and / 2 . Consider f\ first. We 
construct h\ as a perturbation of the standard normal density, 

(2.15) hi(u) = (j)(u)+#o'Wi(u). 

The key is to show that for an appropriate constant $o > and a function 
w±, h\ is indeed a density function, and f± satisfies the heavy-tailed re- 
quirement (2.12). Let k be an even number, we construct w\iu) through 

its characteristic function as follows: u?i(i) = ^-n^zjw' l^ -1 i n the vicin- 
ity of 0, wiit) = \t\~ a for large \t\, and is smooth in between [details are 
given later in (6.1)]. By elementary Fourier analysis, first, we note that 
f w\iu)du = u)i(0) = 0. Second, we note that the tail behavior of w\ is 
determined by the only singular point of w\ (which is t = 0); in fact, by re- 
peatedly using integration by parts, we have that for large u, w±iu) ~ M~ fc , 
that is, 

(2.16) lim Wl iu)\u\ k = 1. 

We shall see that, first, (2.16) implies the heavy-tailed property of f±, and 
second, (2.16) ensures that w±iu) is positive for sufficiently large u, so h\ is 
a density function for an appropriately small $o > 0. Additionally, we will 
justify later that f\ belongs to Therefore, f\ constructed this way meets 
all the desired requirements. 

Now consider / 2 . Similarly, we construct fo 2 as a perturbation of a normal 
density, 
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and the key is to construct w 2 so that f\ and f 2 match in low frequencies. 
Note that 

flit) = rine~ {a2+1)t2/2 + e- aH2 l\(\ - Vn ) + tf 

and 

h(t) = Vne- ia2+1)t2/2 + e- a2t2 / 2 [(l - Vn ) + MnMt)}- 

By direct calculations, in order for f\ an d fi to match in low frequencies, it 
is necessary that 

w 2 (t) = w(t) 

(2-18) 

for all \t\ < T n where w(t) = e Kt2 / 2 wi{t) + ^[e^* 2 / 2 - 1]. 

#0 Vn 

In light of this, we construct w 2 through its characteristic function as follows: 
w 2 (t) = w(t) for \t\ < r n , w 2 (t) = for \t\ > r n + 1, and is smooth in between. 
Figure 1 illustrates the construction of w\ and w 2 \ see details therein. 

We now investigate what is the largest 5 n so that f% constructed this way 
belongs to To. By the definition of Tq, it is necessary that |/i2(*)l — ^4N _Q 
for all t, and especially that I^C^n)! < Ar~ a . Recall that w\{r n ) = "&o T n a -> 
we have 

h 2 (r n ) = e^^n) + |I^[e^/ 2 - 1] ~ ofr"" + -^r 2 ) . 
Together, these require that 

Sn<Cri n T-^+ 2 \ 
In light of this, wo calibrate S n as 
(2.19) 5 n = e d Q r }n T^ a+2 \ 

where #o > is a constant to be determined. Interestingly, it turns out that 
for an appropriately small 8q, w 2 constructed in this way ensures that h 2 is 
a density function and that f 2 lives J-q (see Section 6). Therefore, the largest 
possible S n is of the order of 0(r/ n r n ^ Q+2 ^). 

We are now ready to state the minimax lower bounds. Let M q be the qth. 
moment of the standard normal [i.e., M q = E\X\ q with X ~ iV(0, 1)], the 
following theorem is proved in Section 6. 

Theorem 2.1. Fix a > 2, /3 e [0, 1/2), e G (0, 1), q > 0, a > and A > 

Va T TlMq /q . There is a constant C > which depends on a, (3,Eo,q,a and 
A such that, 

lim n 213 • (logn) (a+2) • R°(T {a,/3,e ,q,a,A;n)) > C 

n— »oo 
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Fig. 1. The first three panels illustrate wi(t) (red), w(t) (blue) and u)2(t) (green). Note 
that w is not a characteristic function as w(t) > 1 for large \t\, and that W2 is a truncated 
version of w. The last panel is the overlay and zoom in of the first three panels. 

and 

lim n 2/3 • (log ra) (o+1) • B^(T (a, P, e Q ,q, a, A; n)) > C. 

n— >oo 

Due to the calibrations we choose in (2.3) and (2.5), the optimal rate is 
expressed in terms of parameters a, {3. Such calibrations are mainly for the 
simplicity in the presentation: Theorem 2.1 (as well as Theorems 2.2, 3.1 and 
3.2 below) can be extended to more general settings. Here is an example. Fix 
£o £ (0,1) and [3 £ [0,1/2), suppose we (a) modify the calibration of e n in 
(2.5) into that rj n < e n < eo with r\ n being a sequence satisfying rj n > e^n~^ , 
and (b) change the parameter space from Fq to F'q = T^a^q, a, A,rj n ;n), 
where 

Fo(a,P,q,a,A,r) n ;n) 

= {/ £ T and satisfies (2.2), (2.3), and constraints on e n above}. 

The following corollary can be proved similarly as that of Theorem 2.1. 
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Corollary 2.1. Fix a>2, (3 e [0,1/2), e £ (0,1), g > 0, a > and 
A > \/a 2 + 1M^ 9 , Zei e n and 6e calibrated as above. There is a constant 
C > which depends on a, (3,£o,q,a and A such that 



We remark that for the case (3 > 0, the condition A > \J a 2 + lM q can 
be relaxed to that of A > aMq^ q . The latter is the minimum requirement for 
otherwise ^(a, /3,Eo,q,a, A;n) is an empty set. Theorem 2.1 shows that the 
minimax risk for estimating <Tq cannot converge to faster than 0(n~ 2/3 ■ 



attained and thus establish the minimax rates of convergence. 

2.2. Rate optimal estimators for the null parameters. In this section, we 
seek estimators of the null parameters whose risks converge at the same 
rates as those of the lower bounds. Once such estimators are constructed, 
then their risks give upper bounds for the minimax risks, and the estimators 
themselves are rate optimal. 

Given that estimating the null parameters is a relatively new problem, 
there are only a small number of methods in the literature. One straightfor- 
ward approach is the method of moments, and another approach, proposed 
by Efron (2004), is to use the half-width of the central peak of the histogram. 
However, these approaches are only consistent in the sparse case where the 
proportion e = e n tends to as n tends to oo. See Jin and Cai (2007) for 
more discussion. 

In our recent work [Jin and Cai (2007)], we demonstrated that the null 
component can be well isolated in high-frequency Fourier coefficients, and 
based on this observation, we introduced a Fourier approach for estimating 
the null parameters. In detail, for any t and complex- valued differentiable 
function £, let Im(£) be the imaginary part and £ be the complex conjugate, 
we introduce two functionals as follows: 



hm r," 2 • (logn)( a+2 ) • R^(^(a,P,q,a,A,r, n ;n)) > C 



and 



hm V - 2 .{\ogn)^.R u n {F' {a^,q 



a,A,rj n ;n)) > C. 



n— >oo 




(2.20) 
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Next, fix 7 G (0, 1/2), let ip n (t) be the empirical characteristic function, 



1 " 

(2.21) = 



n ■ 
and 

(2.22) f n (7) = min{t:t>0 J | ¥ J„(t)|<n-T}. 

We define the estimators for ctq and uq as 

^o(7) = °"o(*n(7); <Pn), ^o(7) = «o(*n(7); V'n)- 

To illustrate the idea behind the construction of these estimators, we 
consider a simplified case where / is a homoscedastic Gaussian location 
mixture: 





fx-UQ 




V °o 




/ x — u\ 




\ °o ) 



+ e / — (f>( )h(u)du, h: a univariate density. 

J o-q \ cr / 

First, the empirical characteristic function approximates the underlying char- 
acteristic function ip(t) = tp(t; f) = E[e itx '\, 

ip n (t) « <p(t) = e-W\l - e)e mat + eh(t)\. 
Second, by the well-known Riemann-Lebesgue lemma, for large t, h(t) ~ 0, 



so 

T 2 + 2 



tf{t) « (1 - e)e- CT 0* 2 /2 e mo* = 

Last, t n (7) approximates its nonstochastic counterpart £71(7)1 
(2.23) t n (7) =min{i:i > 0, \tp{t)\ <n -7 }. 

Putting these together, we have that, heuristically, 

(To(7) ~ O"o(tn(7), Vo) = CTO: "oM ~ w (*n (7) , f0 ) = «0, 

where "=" follow from direct calculations. See more discussions in Jin and 
Cai (2007). 

The above approach has been studied in Jin and Cai (2007), where it 
was shown to be uniformly consistent across a wide class of cases. However, 
whether any of these estimators attains the optimal rate of convergence re- 
mains an open question. The difficulty is two-fold. First, compared to the 
study on consistency as in Jin and Cai (2007), the study on the optimal rate 
of convergence needs a much more delicate analysis on several small proba- 
bility events. Tighter bounds on such events are not necessary for showing 
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the consistency, but they are indispensable for proving the optimal rate of 
convergence. Second, a major technical difficulty is that the frequency t n (7) 
is stochastic and is not independent of the samples Xj. The stochasticity 
and dependence pose challenges in evaluating the estimation risks, and are 
the culprits for the lengthy analysis. 

In this paper, we develop new analytical tools to solve these problems. The 
new analysis provides better probability bounds on several nuisance events 
and better control on the stochastic fluctuation of t n {^), 00(7) and ^0(7)- 
The analysis reveals that the estimators 00(7) and ^0(7) are m fact rate- 
optimal under minimum regularity conditions. This is the following theorem, 
which is proved in Section 6. 

Theorem 2.2. Fix 7 £ (0, 1/2), a > 2, (3 £ [0, 1/2), e £ (0, 1), q > A, 
a > and A> \/a 2 + lMg /q . There is a constant C > which only depends 
on 7, a, P, Eo,q,a and A such that 

sup E[a 2 {ri) - a 2 ] 2 < C(n~ 2/3 log- (a+2) (n) + log(n) • n 2 ^ 1 ) 

J r o(a,/3,eoi<?) a !^l;' 1 ) 

and 

sup E[uo(i) - n ] 2 < C(n- 2/3 log- (Q+1) (n) + log 2 (n) • n 2 ^ 1 ). 

To(a,{i,Eo,q,a,A;n) 

Taking 7 < 1/2 — /? in Theorem 2.2, it then follows from Theorems 2.1 
and 2.2 that the minimax rate of convergence for estimating the null param- 
eters erg and no are n~ 2 ^ \og~^ a+2 \n) and ?i~ 2/3 log~( Q+1 )(n), respectively. 
Furthermore, the estimators <5"o(7) and floil) with 7 < 1/2 — f3 are rate op- 
timal. Different choices of 7 does not affect the convergence rate but may 
affect the constant. In Section 4, we investigate how to choose 7 in practice 
with simulated data. We find that in many situations, the mean square error 
is relatively insensitive to the choice of 7, provided that it falls in the range 
of (0.15,0.25). 

We mention that the logarithmic term in the minimax risk bears some 
similarity with the conventional deconvolution problem. See Section 5 for 
further discussion. 

3. Estimating the proportion of nonnull effects. We now turn to the 
minimax estimation of the proportion. First, we consider the case where the 
null parameters are known. We show that, with careful modifications, the 
approach proposed in our earlier work [Jin and Cai (2007) and Jin (2008)] 
attains the optimal rate of convergence. We then extend the optimality to 
the case where the null parameters (uo,cr 2 ) are unknown. 
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3.1. Estimating the proportion when the null parameters are known. When 
the null parameters (uo>°o) are known, we can always use them to re- 
normalize the test statistics Xj. So without loss of generality, we assume 
uq = and oo = 1. As a result, the marginal density of Xj obeys a simplified 
form, 

(3.1) Xj li ~-{l-£)4>{x) + e J (p{^^jdH{u,a) = f. 

The problem of estimating the proportion has received much recent at- 
tention. See, for example, Storey (2002), Genovese and Wasserman (2004), 
Meinshausen and Rice (2006) [see also Efron et al. (2001) and Swanepoel 
(1999)]. A necessary condition for the consistency of several of these ap- 
proaches is that the marginal density of the nonnull effects (i.e., / alt ) is 
pure, a notion introduced in Genovese and Wasserman (2004). Unfortu- 
nately, the purity condition is generally not satisfied in the current setting; 
see Jin (2008) for a detailed discussion. 

In our recent work Jin and Cai (2007) and Jin (2008), we proposed a 
Fourier approach to estimating the proportion which is described as fol- 
lows. Let be a bounded, continuous, and symmetric density function 
supported in (—1,1). Define a so-called phase function 

if> n (t;u>) = ip n (t;L>,X 1 ,X 2 ,...,X n ) = J w{£)e*?l 2 tp n (t£) d£, 

where as before <p n (t) = ^ Sj=i e ltXj is the empirical characteristic function. 
Fix 7 G (0, 1/2) and let t n = t n {j) be as in (2.23), the estimator is defined 
as 

(3.2) £ n (7;w) =i n (y,LU,X 1 ,X2, ■ ..,X n ) = 1 - Re(ip n (t n (~f);uj)), 

where Re(z) stands for the real part of z. In Jin and Cai (2007) and Jin 
(2008), three different choices of are recommended, namely the uniform 
density, the triangle density and the smooth density that is proportional to 
expt-^r) • l { |e|<i } . 

The advantage of the Fourier approach is that it is no longer tied to 
the purity condition and can be shown to be consistent for the proportion 
uniformly for all eligible H(u,a); see details in Jin and Cai (2007) and Jin 
(2008). However, unfortunately, it is not hard to show that these estimators 
are not rate optimal with any of these three u). 

In this paper, we propose the following estimator: 



e n (l)= (l--J2 et2/2cos ( tX j) 



n 

3=1 



t=^/2ylogn 
n 

(1_7) cos ( V^ognXj ) . 



n 

3=1 
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In comparison, e n {^) is a special case of e n {^]uj), where instead of being a 
density function as in (3.2), w is a point mass concentrated at 1. We shall 
show that under mild conditions, the proposed estimator £ n (l) attains the 
optimal rate of convergence. In detail, fix a > 0, (3 G [0,1/2), eq G (0,1), 
q>2, and A > v^M,^ 2 , let rj n = £on~^ be as before. Consider the following 
parameter space for the minimax theory on estimating the proportion: 

(3.3) F = T{a,/3,e ,q,A;n) = {feJ c :e<r ln ,J \x\ q f(x) dx < A q 

The minimax risk for estimating the proportion when the null parameters 
are known is 

(3.4) R% a = R% a (T(a,P,£ ,q,A;n))=M{_ sup E[e - e] 2 }. 

We have the following theorem. 

Theorem 3.1. Fix 7 G (0,1/2), a > 0, (3 G [0,1/2), e e (0,1), q > 2, 
A>V2Mq /q . There is a generic constant C > which only depends on a, 
f3, £q, q, A and 7 such that for sufficiently large n, 

R% a (F(a,f3,e ,q,A;n)) > Cn~ 2 P\og- a {n) 

and 

sup E[e{n) -e] 2 < C(n _2/3 log- a (n) +n 2 ^ 1 ). 

T (a,[S,£ ,q,A;n) 

In particular, if j < 1/2 — /3, i/ien e n (7) attains the optimal rate of conver- 
gence. 

The proof of Theorem 3.1 is similar (but significantly simpler) than The- 
orem 3.2 below, which deals with the case where the null parameters are 
unknown. For reasons of space, we provide the proof of Theorem 3.2 in 
Section 6 but omit that of Theorem 3.1. 

3.2. Estimating the proportion when the null parameters are unknown. 
We now turn to the case where the null parameters are unknown. A natural 
approach is to first estimate the null parameters with (00(7), uo (7)) and 
then plug them into i n {l) to obtain an estimate of the proportion. In other 
words, fix 7 G (0, 1/2), the plug-in estimator is 

(3.5) f;(7) = l--£y 2 / 2 cos(^ 



00(7) 
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We consider the minimax risk over the parameter space J-q. The minimax 
risk for estimating the proportion when the null parameters are unknown is 
then 

(3.6) R% b = R £ /(F (a,P,£ ,q,A,a;n)) = inf { sup E[e-e} 2 \. 

The following theorem, proved in Section 6, shows that the plug-in estimator 
is rate optimal. 

Theorem 3.2. Fix 7 e (0, 1/2), a > 2, $ e [0, 1/2), e G (0, 1), q > 4 + 
27, a > and A > V« 2 + lMg /q . There is a generic constant C > which 
only depends on 7, a, f3, £q, q, a and A such that for sufficiently large n, 

R £ />Cn- 2 P\og- a (n) 

and 

sup E[e* n {j) - e} 2 < C(n^log~ a {n) + log 3 (n) • n 2 ^ 1 ). 

J-b(a,/3,£o,<7,«,^-;»i) 

Especially if '7 < 1/2 — /3, then £^(7) attains the optimal rate of convergence. 

Compare Theorem 3.2 with Theorem 3.1, we see that except for the small 
difference in the upper bound [one has the log 3 (n) term and the other does 
not], the minimax rates of convergence are the same whether the null pa- 
rameters are known or not. The log 3 (n) is the price we pay for the extra 
variability in estimation when the null parameters are unknown. Therefore, 
the plug-in estimator £^(7) given in (3.5) is rate-optimal under almost the 
same conditions as in the case where the null parameters are known. 

4. Simulation study. The procedures for estimating the proportion and 
null parameters presented in Sections 2 and 3 are easy to implement. In this 
section, we investigate the numerical performance of the procedure with 
simulated data. 

The numerical study has several goals. The first is to consider the effect 
of the tuning parameter 7 on mean squared error (MSE) of the estimators 
and to make a recommendation on the choice of 7. The second is to compare 
the performance of the estimators with different n. The third is to compare 
the procedure with those in the literature. Several different combinations 
of the proportion and the nonnull distributions are used for such compar- 
isons. The fourth is to investigate the performance of the estimators when 
the assumptions on eligibility and independence do not hold. The last and 
the most important goal is to study the effect of the estimation accuracy 
over the subsequent multiple testing procedures. Along this line, we con- 
sider two specific multiple testing procedures in our numerical study. One 
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is the adaptive p- value based procedure (AP) introduced in Benjamini and 
Hochberg (2000) which requires an estimation of the proportion e. This is 
the original Benjamini-Hochberg step-up procedure with an adjusted FDR 
level accounting for the sparsity. Another is the AdaptZ procedure (AZ) 
proposed in Sun and Cai (2007). This procedure thresholds the ranked Lfdr 
statistic (1.3) and requires estimations of e, / and / nu . The procedure is 
asymptotically optimal in the sense that it minimizes the false nondiscovery 
rate asymptotically when the estimators of e, / and / nul1 are consistent. 

Unless specified otherwise, the simulation results given in this section are 
based on n = 10,000, 1000 replications and the following Gaussian mixture 
model: 

(4.1) Xi ~ (1 - e)N(» ,a 2 ) + ^N( m ,a 2 ) + ^N( m , a 2 ), 

where fin and [iii are drawn from some distributions that may change from 
one case to another. Below, we report the simulation results along with the 
five aforementioned directions. 

First, we study the effect of the tuning parameter 7 on the performance 
of the estimators. To this end, we consider the following setting. 

Setting 1. We take ^ = 0, £J = 1, Hu ~ Uniform(-0.9, -0.1), /i 2 j ~ 
Uniform(0.5, 1.5), e = 0.2 and a = 1.2. 

Table 1 tabulates the MSE of the three estimators £^(7), 1*0(7) an d & 2 {l)- 
The results suggest that e* and a 2 perform well in terms of the MSE when 
7 is in a neighborhood of 0.2, ranging from 0.14 to 0.26 (note that, however, 
the estimator p,Q favors a smaller 7). Additional simulations show similar 
patterns. In light of this, we conclude that an overall good choice is 7 = 0.2. 
We recommend this choice for practical use in general, and use it in the rest 
of simulation study in this paper. 

Second, we investigate how the number of hypotheses n affects the esti- 
mation accuracy. The setting we consider is the same as Setting 1, but with 
different n. 

Setting 2. We take ^ = 0, £J = 1, Hu ~ Uniform(-0.9, -0.1), /i 2 j ~ 
Uniform(0.5, 1.5), e = 0.2, a = 1.2 and n ranges from 2000 to 500,000. 



Table 1 

MSE (in unit of 1CP 4 ) of the estimators £^(7), ^0(7) and 60(7) for different 7 



7 


0.08 


0.11 


0.14 


0.17 


0.20 


0.23 


0.26 


0.29 


0.32 


0.35 


0.38 


MSE(e* ) 


15.1 


11.8 


8.58 


5.90 


4.14 


3.81 


6.33 


16.5 


46.1 


91.6 


142 


MSE(fio) 


0.37 


0.93 


1.79 


3.11 


5.40 


9.65 


17.8 


33.3 


63.0 


114 


204 


MSE(o-g) 


2.31 


1.57 


1.07 


0.78 


0.68 


0.77 


1.08 


1.70 


2.83 


4.89 


8.84 
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Table 2 

Comparison of MSE (in unit of 10 -5 ) for different n under Setting 2. 
The tuning parameter j is set at 0.2 



n 


2000 


5000 


10,000 


15,000 


20,000 


50,000 


100,000 


500,000 


MSE(e*) 

MSE(Ao) 
MSE(ag) 


306.6 
596.6 
74.6 


102.6 
143.8 
19.6 


43.9 
60.5 
7.1 


26.1 
31.7 
3.95 


17.7 
19.3 
2.5 


4.6 
5.8 
0.6 


1.7 
1.9 
0.2 


0.2 
0.2 
0.01 



Table 2 summarizes the MSE of the estimators under Setting 2. The 
results show that the accuracy of the estimators improves quickly as n in- 
creases. 

We now move to our third goal and compare the proposed estimator 
for the proportion with those in the literature, namely Efron's estimator 
e E [Efron (2004)] and Storey's estimator e s [Storey (2002), Genovese and 
Wasserman (2004)], assuming the null distribution is known. To distinguish 
from e n (7), we denote the special case of 7 = 0.2 by 

and may drop the subscript n for simplicity. We compare these three es- 
timators with data generated with different proportion e (Setting 3a) and 
different heteroscedasticity parameter a (Setting 3b). 

Setting 3a. We take fi = 0, a = 1, ~ Uniform(-0.9, -0.1), fi 2 i ~ 
Uniform(0.5, 1.5) and a = 1.2. The value of e varies from 0.03 to 0.30. The 
goal is to see how the performance of the three estimators depends on the 
sparsity. 

Setting 3b. We set fi = 0, a = 1, pin ~ Uniform(-0.9, -0.1), fi 2 i ~ 
Uniform(0.5, 1.5) and e = 0.2. The value of a varies from 1.2 to 2.1. The goal 
is to study the effect of the nonnull distribution on the estimation accuracy 
of the proportion estimators. 

Table 3 tabulates the MSEs of these three point estimators. It is clear that 
our estimator e CJ performs well uniformly in all the cases. In particular it 
is robust under the various settings of nonnull distribution and sparsity. 
Table 3 shows that the MSE of £ CJ increases gradually from 5.7 x 10~ 5 
to 10.1 x 10~ 5 as e increases from 0.03 to 0.30. In comparison, the other 
two estimators e s and e E perform well in the sparse case but poorly in the 
nonsparse case. The MSEs of e E and e increase about 120 times and 80 
times, respectively, and they can sometimes be more than 10 times (some 
times even 39 times) larger than the MSE of e CJ . 
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Table 3 

Comparison of MSE (in unit of 10 -5 ) of three-point estimators i ci , e E and i s 

Setting 3a 



e 


0.03 


0.06 


0.09 


0.12 


0.15 


0.18 


0.21 


0.24 


0.27 


0.30 


MSE(e CJ ) 


5.7 


7.7 


9.0 


9.9 


9.3 


10.3 


10.0 


11.2 


11.5 


10.1 


MSE(e E ) 


3.3 


14.6 


33.4 


60.3 


95.8 


139 


190 


249 


316 


394 


MSE(e s ) 


2.4 


8.9 


19.5 


32.9 


49.9 


72.8 


99.7 


130 


163 


195 










Set1 


;ing 3b 














1.2 


1.3 


1.4 


1.5 


1.6 


1.7 


1.8 


1.9 


2.0 


2.1 


MSE(e CJ ) 


67.3 


53.7 


41.8 


31.7 


24.0 


17.6 


13.2 


9.4 


7.0 


4.8 


MSE(e E ) 


172 


164 


153 


146 


138 


129 


122 


114 


108 


100 


MSE(e s ) 


89.0 


81.6 


72.2 


67.7 


61.9 


55.4 


50.3 


46.7 


43.5 


41.0 



Next, we consider the case where either the assumption on eligibility or 
the assumption on independence is violated. Consider the eligible assump- 
tion first. Denote by DE(/x,r) the double exponential distribution with the 
density function f(x;fj,,T) = ^e~^ x ~ lL ^ T . We shall generate Xi as 

(4.2) Xi ~ (1 - e)N(iMi,o$) +^BE(fi u ,r) + |dE(/z 2i ,t). 

Since the double exponential can be viewed as a scale Gaussian mixture 
[West (1987)], it is seen that the eligible condition does not hold. Two dif- 
ferent settings are considered. 

Setting 4a. We take (J,q = 0, ctq = 1 and assume the null parameters 
Ho and oo are known. First generate fin from U{— 0.9, — 0.1) and from 
U (0.5, 1.5), then generate Xi as in (4.2) with r = 1.2. The proportion e varies 
from 0.03 to 0.30. 

Setting 4b. We take ho = 0, oo = 1 and assume the null parameters 
Ho and ao are unknown. First generate fiu from U{— 0.9, —0.1) and fi2i from 
[7(0.5, 1.5), then generate Xi as in (4.2) with e = 0.2. The value of r varies 
from 1.2 to 2.1. 

Table 4 gives the MSEs in Settings 4a and 4b. In Setting 4a, Efron's 
method is often found to be divergent numerically and is thus excluded 
from comparison. For small e, Storey's method and our method yield sim- 
ilar results and both perform well. For moderate to large e, however, our 
method demonstrates great superiority. In Setting 4b, Efron's method is 
again found to be divergent, and Storey's method does not apply as it re- 
quires the information of the null parameters. We therefore exclude both of 
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Table 4 

MSE (in unit of 10 -4 ) in Settings 4& and 
Setting 4a 



e 


0.03 


0.06 


0.09 


0.12 


0.15 


0.18 


0.21 


0.24 


0.27 


0.30 


MSE(e CJ ) 
MSE(e s ) 


8.17 
3.25 


7.28 
6.79 


6.35 
9.76 


5.65 
14.35 


4.92 
19.93 


4.20 
19.69 


3.78 
23.68 


3.02 
21.67 


2.51 
21.01 


2.01 
20.18 










Setting 4b 












T 


1.2 


1.3 


1.4 


1.5 


1.6 


1.7 


1.8 


1.9 


2.0 


2.1 


MSE(e*) 
MSE(Ao) 
MSE(ag) 


11.9 
0.16 
4.1 


10.7 
0.18 
4.1 


9.7 

0.19 

4.2 


8.7 

0.18 

4.2 


7.9 

0.19 

4.0 


7.1 

0.19 

3.9 


6.5 

0.20 

3.7 


5.8 

0.22 

3.6 


5.3 

0.23 

3.5 


4.8 

0.23 

3.3 



them from comparison. In both settings, despite that the eligible condition 
is violated, our method continues to perform well. 

The unsatisfactory behavior of Efron's estimator and Storey's estimator 
can be explained as follows. It is known in the literature that a necessary 
condition for Efron's estimator or Storey's estimator to be consistent is that 
the alternative density has a thinner tail than that of the null density either 
to the left or to the right [this is the so-called purity condition; see, e.g., 
Genovese and Wasserman (2004), Jin and Cai (2006) and Jin (2008)]. In 
Settings 4a and 4b, due to the heavy tail of the double exponential density, 
the purity condition is violated. It can be shown that asymptotically the bias 
of either Efron's estimator or Storey's estimator has the same magnitude as 
that of the true proportion. This explains why Efron's method does not 
always converge, and Storey's method has a reasonable performance when 
the underlying proportion is small, but behaves increasingly unsatisfactory 
as the proportion gets larger. This also suggests that, when the alternative 
density has a heavy tail, relying on the tail area for inference (as that in 
Efron's/Storey's method) can lead to a large bias. A promising alternative 
is the proposed Fourier-based method. 

We now consider a case where the assumption on independence is violated. 
To do so, let L be an integer that ranges from to 50 with an increment of 10. 
For each L, we generate n + L samples w±,W2, • • • , w n+ L from iV(0, 1), then 
let Zj = Yli=j w i- The samples zj generated in this way are blockwise 
dependent with a block size L (note that L = corresponds the independent 
case). The setting we consider is as follows, where the null parameters are 
assumed as unknown. 

Setting 4c. Fix e = 0.2 and a = 1.2. Generate Xj = Zi for i = 1, 2, . . . , 
8000, Xi = Hn + azi for 8001 < i < 9000, and X { = fj, i2 + gz { for 9001 < % < 
10,000, where fin from Z7(-0.9,-0.1) and from C/(0.5,1.5). 
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Table 5 

MSE (in unit of 10" 3 ) in Setting 4c 



L 





10 


20 


30 


40 


50 


MSE(e CJ ) 


8.8 


10.3 


16.6 


25.2 


34.7 


43.2 


MSE(Af J ) 


10.4 


37.5 


63.8 


94.4 


131.7 


150.0 


MSE(<t£ j ) 


5.4 


13.5 


23.0 


34.8 


49.3 


52.1 


MSE(e E ) 


34.3 


34.1 


33.5 


33.2 


33.2 


32.3 


MSE(/if) 


1.2 


2.8 


4.0 


5.4 


7.0 


8.8 


MSE(o-f) 


14.7 


18.1 


21.7 


28.1 


34.7 


33.5 



Table 5 summarizes the results. In terms of MSE, the estimation accu- 
racy decreases as the range of dependence increases. However, the MSE are 
still relatively small, especially those correspond to proportion and the null 
variance parameter ctq. In comparison to Efron's method, correlation has a 
relatively larger impact on our method. The performance of our estimation 
procedure is better than Efron's when the correlation is weak to moderate. 
However, Efron's method is better when the correlation is strong. 

The insight lies in the effect of correlation over the bias and variance. 
For all these estimators, the bias contains mainly marginal effects so the 
correlation does not have much effect on it. The correlation, however, may 
have important effect on the variance [see Jin and Cai (2006) and Jin (2008)] . 
In comparison, despite that our methods have a smaller bias, it gives relative 
larger MSE because it has a larger variance and is relatively more vulnerable 
when the correlation is strong. 

Finally, we investigate how the point estimators affect the results of subse- 
quent multiple testing procedures. First, we use the adaptive p- value based 
procedure [Benjamini and Hochberg (2000)] to compare the effect of the 
three point estimators of the proportion in the subsequent multiple testing. 
To this end, we consider the following two settings (which are the same as 
Setting 3a and 3b, respectively, but we restate them to avoid confusion). 

Setting 5a. We take // = 0, <t = 1, ~ Uniform(-0.9, -0.1), /i 2 « ~ 
Uniform(0.5, 1.5) and a = 1.2. The value of e varies from 0.03 to 0.30. 

Setting 5b. We set /u = 0, a = 1, fiu ~ Uniform(-0.9, -0.1), fi 2 i ~ 
Uniform(0.5, 1.5) and e = 0.2. The value of a varies from 1.2 to 2.1. 

It is known that the original step-up procedure of Benjamini and Hochberg 
(1995) is conservative: it controls the FDR level at (1 — e)a instead of 
the nominal level a. To remedy this shortcoming, Benjamini and Hochberg 
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(2000) proposed an adaptive BH procedure which applies the original step- 
up procedure at level a' = a/{l — e) instead of a, where i is an estimate of 
s. Clearly the true FDR level of the adaptive BH procedure depends on the 
estimation accuracy of £. 

We now compare the actual FDR level of the adaptive BH procedure using 
and e E . In addition we also use the deviations of the false discovery 
proportion (FDP) from the nominal FDR level as a measure of the accuracy 
of the testing procedure. The FDP is a notion that is closely related to 
FDR: the FDP is the proportion of false positives among all rejections, and 
the FDR is the expected value of the FDP; see, for example, Genovese and 
Wasserman (2004). The deviations of the FDP from the nominal FDR level 
are naturally summarized by mean squared error. Denote the FDP of the 
adaptive BH procedure with the proportion being estimated by e E , e s and 
e CJ by FDP E , FDP S and FDP CJ . 

Figure 2 compares the actual FDR levels as well as the MSEs of FDP E , 
FDP S and FDP CJ . The two right panels are the ratios of the MSEs of FDP E , 
FDP S and FDP CJ to MSE(FDP CJ ). In each of these settings, overall, the 
true FDR level of the adaptive BH procedure using e CJ is closest to the 
nominal level. The other two estimators, e E and e s , tend to under-estimate 
the proportion e and consequently yield conservative testing procedure with 
the true FDR level below the nominal value. The FDP plots indicate that 
overall FDP E has larger deviations from the nominal FDR level in individual 
realizations than that of FDP S which is itself larger than that of FDP CJ . 
These results show that our estimator e CJ yields the most accurate testing 
procedure: compared to FDP S and FDP E , FDP CJ is not only smaller in 
biases, but also smaller in variances. 

Next, we compare again our estimator of the null parameters with that by 
Efron (2004). But this time we do so by investigating the effect of different 
point estimators over the subsequent multiple testing procedures, namely 
the adaptiveZ procedure by Sun and Cai (2007). In detail, we consider the 
following setting. 

Setting 5c. We take fi = 0, fiu ~ Uniform(-0.9, -0.1), fi 2 i ~ 
Uniform(0.5, 1.5), e = 0.2, and a = 1.3. The value of varies from 0.5 to 1. 
In this setting we estimate both the proportion e and the null parameters 
Ho and do- 

We now compare the performance of our estimators of the proportion and 
the null parameters with those of Efron (2004). [Storey (2002) assumed a 
known null distribution and did not provided estimators for the null param- 
eters, so we exclude it from the comparison.] We compare the performance 
of these estimators as measured by the accuracy of the actual FDR level 
of the adaptive testing procedure introduced in Sun and Cai (2007). The 



24 



T. T. CAI AND J. JIN 



AdaptZ procedure given in Sun and Cai (2007) aims to minimize the false 
nondiscovery rate subject to the constraint that the FDR level is controlled 
at a pre-specified level. This procedure thresholds the ordered Lfdr statistic 

Udr(z i ) = (l-i)r n (z l )/f(z i ), 

where / nul1 and / are estimators of / nul1 and /, respectively. The marginal 
density / is estimated by a kernel density estimator with bandwidth chosen 
by cross-validation. 

Figure 3 plots the true FDR levels of the AdaptZ procedure using our esti- 
mators of e and j™ 11 with those of the same procedure using the estimators of 
e and / nul1 given in Efron (2004). Figure 3 also displays the ratio of the MSEs 
of the FDP of the two testing procedures, MSE(FDP E )/MSE(FDP CJ ). The 




Fig. 2. The actual FDR levels (left panels) and the MSEs of the FDP (right panels) of 
the adaptive BH procedure using the proportion estimators e E (o line), s s (A line) and 
e CJ (+ line). The nominal level is 0.10. Top row: Setting 5a. The horizontal axis is the 
proportion e. Bottom row: Setting 5b. The horizontal axis is the parameter a. 
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Fig. 3. The actual FDR levels (left panel) and the relative MSEs of the FDP (right 
panel) of the AdaptZ procedure using the estimated null parameters and proportion: Efron's 
estimators (o line) and our estimators (A line). The nominal FDR level is 0.10 and the 
horizontal axis is the parameter ctq. 

results clearly show that the true FDR level of the testing procedure with 
our estimator is much closer to the nominal level than that with the esti- 
mators given in Efron (2004) and the FDP has smaller deviations from the 
nominal FDR level. Indeed, the MSE(FDP CJ ) can sometimes be 15 times 
smaller than MSE(FDP E ) [see Panel (b)]. 

We conclude this section by mentioning that the proposed estimators 
usually yield a more accurate point estimation for the proportion and the 
null parameters than those by Efron (2004) and Storey (2002), not only 
asymptotically, but also for finite n. The accuracy of the proportion and the 
null parameters directly affects the performance of the subsequent testing 
procedures. Our estimators yield more accurate testing results than those 
by in Efron (2004) and Storey (2002). 

5. Discussion. We derived the optimal rates of convergence for estimat- 
ing the null parameters and the proportion of nonnull effects in large-scale 
multiple testing using a Gaussian mixture model. It was shown that the 
convergence rates depend on the smoothness of the mixing density h(u\a). 
The empirical characteristic function and Fourier analysis are crucial tools 
in our analysis of the optimality results. The proposed estimators not only 
are asymptotically rate-optimal but also enjoy superior finite n performance. 
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Both theoretical and numerical results show that these estimators outper- 
form the commonly used estimators in the literature. The improvement in 
the parameter estimation leads directly to more precise results in the sub- 
sequent multiple testing. 

The minimax rates of convergence are proportional to the square of the 
true proportion multiplied by some logarithmic factors. The slowly conver- 
gent logarithmic factors can be attributed to the super-smooth nature of 
the Gaussian density, which attributes to the thin-tailed behavior of the 
corresponding characteristic function. As a result, even a relatively large 
perturbation in the true null parameters or in the true proportion may only 
result in a small difference in the L 2 -norm of the characteristic function, 
which makes the perturbation hard to detect. The logarithmic terms are 
reminiscent of that found in the study of the conventional nonparametric 
deconvolution with Gaussian errors [e.g., Zhang (1990) and Fan (1991)], 
where the culprit for the slow convergence is also the super-smoothness of 
the Gaussian density. However, we should note that the problem considered 
here is different from the deconvolution problem; this explains the difference 
in the rate of minimax risk, the need for new procedures and the need for 
new approaches to derive the minimax risk bounds. 

The work presented in this paper can be extended in several directions. 
First, while we have focused on the case where the characteristic function 
of h decays at a polynomial rate, the results can be conveniently extended 
to the case where it has an exponential tail. Consider, for example, the 
following case: 

\h(t)\ <Cexp(-|t| a ). 

The bias of the proposed estimator for the null parameters (and that for the 
proportion is similar) is of the order of 

exp(-Clog Q / 2 (n)). 

When < a < 2, the bias is still larger than the variance and the rate of 
convergence is basically exp(— C\og a / 2 (n)). When a > 2, the bias tends to 
faster than 1/^/n. In this case, the variance dominates the MSE, and we 
have (9(l/n) convergence rate. Second, while we focus on the case where 
Xj are independent, extensions to the case of weak dependence is possi- 
ble. Jin and Cai (2007) considered two dependent structures: the strongly 
a-mixing case and the short-range dependent case and showed that the esti- 
mators constructed in that paper continue to be uniformly consistent under 
these dependent settings; see details therein. We expect that some of the re- 
sults given in this paper are also extendable to the weakly dependent case. 
Third, while we focus on Gaussian mixtures in this paper, extensions to 
non-Gaussian mixtures is possible; see Jin (2008) for more discussion. An 
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interesting example along this line is to replace the Gaussian mixture by 
the Laplace mixture. Due to the singularity of the Laplace density around 
the origin, the associated characteristic function decays much slower than 
that of the Gaussian density. As a result, the minimax risks for estimating 
the null parameters and the proportion are expected to have faster rates 
of convergence than those presented here. Last, while we focus on squared 
error loss here, the results can be extended to general loss functions. 

We conclude this section by mentioning some possible future research 
directions. First, two key assumptions we make in this paper are the Gaus- 
sian mixture structure of the marginal density of the ^-scores, and the in- 
dependence among different z-scores. An interesting direction is to study 
the extend to which the presented results continue to hold when these as- 
sumptions are violated. An equally interesting direction is to study how 
to normalize/pre-process the data such that the assumptions hold approxi- 
mately. Given the considerable efforts on normalization and pre-processing 
by the gene microarray community in recent years, the research along this 
direction could be very fruitful. Second, it would also be interesting to de- 
velop an adaptive approach to select the tuning parameter 7 in our proposed 
procedure. Given the overwhelming practical interest in large-scale multiple 
testing, this is an interesting problem for further study. 

6. Proof of the main results. In this section, we prove the main results: 
Theorems 2.1, 2.2 and 3.2. 

6.1. Proof of Theorem 2.1. The proofs of the minimax lower bounds for 
estimating the null parameters <7q and uq are similar. We present a detailed 
proof for the first claim and only a brief outline for the second one. 

Consider the first claim. The key is to flesh out the ideas sketched in 
Section 2.1. We begin by filling in the details of the construction of w\ and 
w 2 . Let k be the smallest even number that is greater than 2q + 1, let 



and let s\ and s 2 be two symmetric smooth functions, where s\ satisfies 
(1). < «i(t) < 1, (2). si{t) = 1 when \t - 1| > 2/3, and (3). s x {t) = when 
\t - 1| < 1/3, and s 2 satisfies (1). < s 2 (t) < 1, (2). s 2 (t) = 1 when < \t\ < 
r n + l/3, and (3). s 2 (t) = when \t\ >r n + 2/3. The existence of such smooth 
function is well known in the literature; see Erdelyi (1956), for example. We 
construct w\ and w 2 through their characteristic functions by 




0<£ < 1 



(6.1) 



w x {t) = Sl {t)i{t) 
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and 

(6.2) w 2 (t) = s 2 (t) • {e^l^Mt) + Q— 1 ) ~ !]) 5 

see Figure 1 for illustrations. 

Now, to show the claim, it remains to show (a) h\ and h 2 are indeed 
densities, (b) the x 2 -distance between f\ and f 2 is equal to o(l/n) and (c) 
the densities f\ and f 2 in (2.9) and (2.10) satisfy the constraints (2.2) and 

(2.3) and therefore live in Tq(ol, [3,Eo,q,a, A; n). To do so, we need some 
lemmas. 

Let g be the Gaussian mixing density 

f 1 fx — ix \ 
g(x) =g(x;w 1 ,a) = / -M )wi(u). 



a \ a 

By the way /i is defined in [see (2.9)], it is not hard to see that 

(6.3) fx(x) = (1 - r] n )4> a (x) + Vn4>^rp[{x) + ^r] n g{x), 

where 4> a denotes the density of ^(0, a 2 ). The following lemma characterizes 
the tail behavior of w\, and so that of g and f±. 

Lemma 6.1. For large \u\, wi(u) ~ |n|~ fc . As a result, for sufficiently 
small $0 > 0, there is a constant C > 0, 

(6.4) |<7(x)| < C(l + \x\)- k , h(x) > C Vn (l + \x\y k . 

Here, C > is a generic constant which only depends on (some or all) the 
parameters a, j3, Eq, q, a, A, k, #o and #o- The same rule applies below. 
Next, the following lemma elaborates the tail behavior of w 2 . 

Lemma 6.2. For sufficiently large \u\ and n, there is a constant C > 
such that 

(6.5) \\u\ k w 2 (u)-l\ <C/\u\. 

Last, the following lemma describes how close /i and f 2 are in the fre- 
quency domain. 

Lemma 6.3. When < \t\ < T n , f\(t) = f 2 (t). When \t\ > r n , there is a 
constant C > such that for sufficiently large n, 

\ft\t) - f [ 2 m \t)\ < C\t\ m e- aH2 / 2 , m = 0, 1, . . . , k/2. 
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Lemmas 6.1-6.3 are proved in the Appendix. 

We are now ready to prove (a)-(c). Consider (a) first. By Lemmas 6.1 
and 6.2, both w\ and w 2 are positive for sufficiently large \u\. Therefore, (a) 
holds once we take $o sufficiently small. 

Consider (b) next. Recall that the x 2 -distance is d{f\,f 2 ) = f[(fi( x ) — 
fi(x)) 2 / fi(x)]dx. By (6.4) in Lemma 6.1, 

(x)' 2 

fi(x) 

<Cn^(I + II), 

where 1 = f \f 2 (x) — fi(x)\ 2 dx and 11= f \x\ k (f2{x) — f\(x)) 2 dx. Now, by 
Parseval's formula [Mallat (1998)], for any integers < m < k/2, 

X 2m \f2(x)-fl(x)\ 2 dx= [ \x m f 2 {x)-X m h(x)\ 2 dx 



(6.6) 



\h{x) hW.teKCrj? l{l + \x\f\h{x)-h{x)\ 2 dx 



(6.7) 

\ft\t)-fT\t)\ Z dt, 



where by Lemma 6.3, the last term satisfies that 

(6.8) f x 2m \f 2 {x)- h{x)\ 2 dx<C [ \t\ m e- aH2/2 dt. 

J J\t\>T„ 

Now, applying (6.8) to the case of m = and m = i gives 

(6.9) I + n<c/ (l + |t| fc / 2 )e- a2 * 2 / 2 dt<C7r n fc / 2 - 1 e- a2r "/ 2 

J\t\>T„ 

and (b) follows by that /3 < 1/2 and that or n = ^/3 logn. 

Last, we show (c). It is sufficient to check both f\ and f 2 satisfy (2.2) 
and (2.3). Consider /i first. Recall that M q is the gth moment of iV(0, 1), 
combining (6.3) and (6.4) gives 

J \x\y{x) dx < [(1 - rjn)^ + Vn (a 2 + l) q/2 ]M q + C-d r] n 

< (a 2 + \yl 2 M q + C# e . 

Therefore, by the assumption of A > \J a? + lMq^ 9 , (2.2) is satisfied once 
we take sufficiently small. At the same time, recall that h\(t) = e - * 2 / 2 + 
i?o^i(0 an< ^ that wi(t) = \t\~ a when \t\ > 4/3, so (2.3) is also satisfied. 
Consider f 2 next. By Lemma 6.1 and the choice of k, the 2q- moment of fi 
is finite. Using Holder's inequality and (b), 



\x\i\fl(x)-f 2 (x)\dx< ( / |x| 2 "/i(x)dx) ( / ^Jj-^^dxj 



o 
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Now, by the triangle inequality, J \x\ q f 2 (x) dx < J"|a;| 9 /i(x) + o(l/y/n), so 
f 2 satisfies the moment constraint in (2.2). At the same time, recall that 
h 2 (t) = e -(i-M* 2 /2 + $ Q w 2 (t) and that 

U, \t\>T n + l. 

By elementary calculus and the choice of r n and 5 n , there is a constant C > 
such that for sufficiently large n and \t\ > 4/3, 

I«i2(t) - *&i(*)| < ce r- (Q+2) t 2 < ce \t\- a , 
\w 2 (t) - w[(t)\ < ce r-( a+2 h < ce \t\-( a+1 \ 

where we have used u>i(t) = \t\~ a for \t\ > 4/3. Combining these we conclude 
that for a sufficiently small 9q, h 2 satisfies (2.3). This concludes the proof of 
(c) and the first claim of Theorem 2.1. 

We now consider the second claim of Theorem 2.1. Similarly, the goal is 
to construct two density functions (say and f±) in J-"o(a, /3, Eq, q, a, A; n) 
such that the null mean parameter uq associated with them differ by a small 
amount, and their x 2 -distance is equal to o(l/n). Let r n , s 2 , and w\ be the 
same as in the proof associated with <Jq, and let 6$ > be a constant to be 
determined. Define 

(6.10) 

W3 = W\ 



Wi[t) = s 2 [t) ■ w 3 (t) - sin" 



and define through its characteristic function by 

2i 1 - r? n . / 
sin 

#0 Vn V 2 

We construct 

h 3 (t) = e^'/ 2 [ e -' 2 /2 + Q . ^ = e -i6nt/2 [e -1?/2 + ^ . 

and 

(6.11) f Z (x) = (1 - Vn)U(^j +Vnj 

(6.12) / 4 (x) = (1 - % )V^-M + r, n J±J X -^- U y A (u)du. 

Note that the null parameters associated with /3 and /4 differ by an amount 
of 5 n . We are able to show that for appropriately small constants tDq > and 
9q > 0, h 3 and /14 are indeed densities, and and live in J-"o(a, /3, Eo, q, a, A; n). 
Also, the % 2 -distance between and is equal to o(l/n). As the proofs 
are similar to that associated with <Tq, we skip them for reasons of space. 
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6.2. Proof of Theorem 2.2. Since the proofs are similar, we only prove 
the first claim. The following lemmas are proved in the Appendix. 

Lemma 6.4. Fix q>4 and 7 € (0, 1/2). For sufficiently large n, and any 
event B n with P{B c n ) < C/n, E[^(<p n ,i n (l)) ~ °l? ■ < Cn^~ l . 

Lemma 6.5. Fix q>4 and 7 £ (0, 1/2). For sufficiently large n, 

EW n {Ul)) ~ v'iKl))? < C\og{n)/n. 

We now proceed to show the theorem. Fix q\ > 3, introduce the event 

f 1 n 

^= n^ |Xi| - mi + 1 ' 
I j=i 

(6.13) 

~ Yl X J - m2 + X > W o(.<Pn, n) < y/2qi log n/Vn \ , 

where mi and 7712 are the first two moments of X\ and 

Wo(</?n;rc) = W ((pn;ri,Xi,X2,...,X n )= sup |^n(i) - 

{0<t<logn} 

Note that, first, by Chebyshev's inequality, 

pji^lXjl >mi + l| <C/n, p ^Yl X j >m 2 + l| < C/n. 



Second, by Lemma A. 2 of Jin and Cai (2007), 



P{W (<p n ',n) > y/2 qi \ogn/y/K} < 41og 2 (n) 



n 



-9i/3 



Recall that q± > 3, it thus follows that P{D^} < C/n. By Lemma 6.4, 
only has a negligible contribution to the mean squared errors: 

(6.14) E[{al{y n Ml)) ~ °l? ■ 1{d§}] < Cn 2 ^ 1 
and all remains to show is 

(6.15) E[(a 2 ( Vn ,i n (l))-c7 2 o) 2 -l { D }} 

< C[n^\og^ a+2) {n) +log(n)n 27 " 1 ]. 

We now show (6.15). Write for short t n = t n {^) and t n = t n {^). By the 
triangle inequality, 

\&o(<Pn,tn) - °~o\ ^ Woifnjtn) ~ ^oO^*™)! + \&o(<P,in) ~ Oo(<P,t n )\ 
+ \°o(<P,tn) ~Oq\. 
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So to show (6.15), all we need to show are 

(6.16) E[(a%(Vn,in) ~ cr%(cp,t n )f ■ 1{d }] < Clog(n)n 2 ^\ 

(6.17) E[(a 2 (<p,i n )-a 2 (<p,t n )) 2 ■ 1 {Dq} ] < Cn 2 ^ 1 
and 

(6.18) \a^,t n )-a 2 \ < Cn~^ log- (Q+2)/2 (n) over D . 



Below, we show (6.16)-(6.18) separately. 

Consider (6.16) first. By Lemmas A. 2 and A. 3 of Jin and Cai (2007), over 
the event Dq, 

(6.19) \<p n {in) ~ <fi(in)\ < C^g^l/y/E, \t n - t n \ < c n^ 1 / 2 , 



where c$ > gq yqxf^i is a constant. Apply Lemma 6.1 of Jin and Cai (2006) 
with f = ip n , g = f, and t = i n , 



(6.20) 



|Oo(<A^*n) - Oo(<P,t n )\ 



1 



3o Q Wn{in) ~ <p(tn)\ + fWrJM ~ ^ tfn)\ 



Combining (6.19) and (6.20) gives that, over the event Dq, 



— + H<Pn(*n) ~<P [t r 



and applying the Lemma 6.5 gives (6.16). 

Consider (6.17) next. Direct calculations show that |^o" 2 ((^,t)| < C for 
sufficiently large t. Using the second part of (6.19), 

(6.21) \a 2 {^,i n )-a 2 {i P ,t n )\<C\t n -t n \<Cn'<- l l 2 over A), 

and (6.17) follows directly. 

Last, we consider (6.18). Similar to Lemma 6.5 of Jin and Cai (2007), 
|<rg(p,t„) - 0o I < C ^it^i where = e n /e i *( u - u °)-( (T2 - CT o)t 2 /2 x 

h(u\a) dH (a) . By direct calculations, 



W{t)\=e n 

< I + II, 



- «o) - (0 2 - 4)ty t( - u - u ^-( a2 -°& t2 / 2 h(u\a) dH{a) 



where 



I = £ r 



f{u- uoy^-^-^-^^h^a) dH{a) 
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and 



11 = e n 



a 2 - al)te 



it(u-u )-(cT 2 -a 2 l )t 2 /2 



h{u\a)dH(a) 



By elementary Fourier analysis and the definition of h(t\a) and h(t\a) [see 
(2-4)], 



~ti(t\a)e 



~{o 2 -o 2 )t 2 /2 



dH(a) 



<e n / \h'(t\a)\dH(a) 



and 



II<e r 



j h(t\a)(a 2 - a 2 )te-^- a ^ t2 / 2 dH{a) <C(e n /t) J 



<C(e n /t) / \h{t\a)\dH{a) 



where we have used the fact that sup a>0 {ate a * 2 / 2 } < C/t. Combining these 
with (2.3) and (2.5) gives (6.18). This concludes the proof of Theorem 2.2. 



6.3. Proof of Theorem 3.2. Consider the first claim first. Similar to the 
construction of the minimax lower bound for estimating the null parame- 
ter Oq, the goal is to construct two density functions (say /s and fo) in 
Fo(ot, (3, £q, q, a, A; n) such that the proportion associated with them differ 
by a small amount, and their x 2 -distance is equal to o(l/n). 

We construct f$ and f$ as follows. Let r n , wi, and S2 be the same as in 
Section 6.1. Similarly, for a constant 9q > to be determined, let 



(6.22) 
and 



$n =$oOoVnT n a , 
W5 = W\ 



we(t) = s 2 (t) 



Vn # T} n 



t 2 /2- 



) • 



We define and he, as 



h 5 (t) = e"* 2/2 + i?o • M*), M*) = e~ t2/2 + # • Mt), 



and 



.23) h{x) = (l-T ln + 6 n )-(t>(~)+ (Vn ~ S n ) 

a \ a 



x — u 



h^{u) du, 



(6.24) / 6 ( x ) = (l-r ?n )-0 - )+r ln / - 



a \ a 



x — u 



he(u) du. 



Note that the proportion associated with f$ and f$ differ by an amount of S n . 
We are able to show that for appropriately small constants t?o > and 9q> 0, 
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/15 and h$ are indeed densities, and fa and fe live in To (a, f3,£o, q,a,A; n). 
Also, the x 2 -distance between /s and f% is equal to o(l/n). As the proofs 
are similar to the case for <Tq, we skip them for reasons of space. 

We now consider the second claim. Write for short e* = £^(7), a" 2 = °o(7)> 
and uo = uo("f), introduce the nonstochastic counterparts of Sg and Uq, re- 
spectively, by 

Vo = cr o{<P,t n ), u = u (<p,t n ), 

where t n is defined in (2.23). The following lemma is a direct result of The- 
orem 1 of Jin and Cai (2007), which elaborates the stochastic fluctuation of 
00 and uq. 

Lemma 6.6. Let 7 E (0, 1/2) and q > 4 + 27 be as in the theorem. There 
is an event B n such that P{B n } =o(l/n) and over the event B n , 

(6.25) \al -&l\< Clog 1/2 (nK~ 1/2 , N - «o| < Clog(n)n 7_1/2 . 

Now, by replacing uq with in the definition of £*, we introduce the 
following pseudo-estimator: 

(6.26) i n = e n ('y,Xi,...,X n ,u ) = 1 - re 7 " 1 ^cosf y^logre ^ - U ° \ 

The pseudo-estimator plays a key role in the proof. To see the point, we need 
some notation. Let (p n be the empirical characteristic function corresponding 
to (Xj - u )/a , 

1 n 

(6.27) <p n {t) = <p n (t;X 1 , . . . , X n ; u , a ) = - V e «*(*i-«o)M> 

n 

i=i 

let <^(t) be the corresponding (underlying) characteristic function 

(p(t) =<p(t;f, uo,a ) = E[<p n (t)] 

and denote the real part of ip n and (p by ip^; and <^ fi , respectively. Observe 
that if we denote 



(6.28) i n = t n (j;a ,ao) = 

then e n can be rewritten as 

(6.29) e n = l-n<(pK{t n ). 

The advantage of introducing e n is two-fold. First, by elementary trigono- 
metries, the difference between i n and e has a very simple form. This is the 
following lemma, whose proof is elementary so we omit it. 
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Lemma 6.7. 
i* n -e n = n 7 Re( <p n (i n ) ■ 



sin 



t_n Up ~ Up 
2 a 



~ Up -Up 

ism I t n — = 



Second, the stochastic fluctuation of e n can be conveniently bounded 
through the maximum deviation of (p n (t) over the interval, say, [0,log(n)]. 
In detail, fix a constant q\ > 3, introduce the following event: 

Dp = \ sup {\<p n (t) - <f(t)\} < y / 2q 1 logn/^/n}. 

l {0<i<logn} J 

The following lemma can be proved similarly as that of Lemma A. 2 in Jin 
and Cai (2007), so we omit it. 

Lemma 6.8. P{D^} < 41og 2 (ra)ra- 91 / 3 . 

A direct consequence of Lemma 6.8 is that 



(6.30) 



E\y n {i n ) - y{i n )\ 2 < E sup {\<p n (t) -y{t)\ 2 } +o(l/n) 

{0<t<logn} J 

< Clog(n)/n. 



Given the lemmas above, what remains to analyze is (p {t n ). Note that t n 
fluctuates around y/2*y logn. We have the following lemma, which is proved 
in the Appendix. 

Lemma 6.9. Let B„ be the event as in Lemma 6.6. We have 



and 



\(p R {t n ) - (^(v^Tlogra)! < Clog 3 / 2 ^)™- 1 / 2 over B n 



We are now ready to show the theorem. By the triangle inequality and 
the Cauchy-Schwarz inequality, 



I ^ti I 

(6.31) < (\e* n - e n \ + \e n - (1 - rC(p R {t n ))\ + |(1 - n*<p R (t n )) - e n \f 
< C(\i* n - e n \ 2 + \i n - (1 - £ R (t n ))\ 2 + 1(1 -n^ R (t n )) - e n \ 2 ). 

First, by (6.29) and (6.30), 

(6.32) E\e n - (1 - n^ R (t n ))\ 2 = n 2 ~*E\$ R (t n ) - (p R {t n )\ 2 < Clog^n 2 ^ 1 . 
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Second, by Lemma 6.9 and the Cauchy-Schwarz inequality, 

E\(l -e n ) -n^ R {t n )\ 2 < C[\og- a ' 2 (n)n^ + log 3 / 2 (n)^" 1 / 2 ] 2 

(6.33) 

< C[log- Q (n)n~ 2/3 + log 3 (n)n 2 T- 1 ]. 

Plugging this into (6.31) gives 

(6.34) E\t n - e n \ 2 < C[E\e* n - e n \ 2 + log- Q (n)n- 2/3 + log^n 2 ^ 1 ]. 
Compare (6.34) with the theorem. All that remains to show is 

(6.35) E\e* n -e n \ 2 < Clog^n)^" 1 . 

We now show (6.35). Note that \e* n -e n \ 2 < n 2 ^ and P{D^UB n } = o(l/n), 

so 

E[\i* n -s n f ■l { 5 EuBCn} }<o(n 2 'y- 1 ) 
and all we need to show is 

(6.36) E[\i* n - e n \ 2 ■ l 0onBn} ) < Clog 3 ^)^" 1 . 

To this end, note that over the event Dq n B n , by Lemma 6.7 and that 
|sin(x)| < C\x\ for all x, 

(c qva \a* z |2 s n+2\ „ (+ m2 (jjj ~ ""o) 2 

(6.37) |e n -e n | < Gi n |</J n (i n )| . 



Now, first, by Lemma 6.6, 

(6.38) i n ~ a/2 log n, |u - u | < Clog^n 7 " 1 / 2 ^). 

Second, by Lemma 6.8 and the Cauchy-Schwarz inequality, 

y/2qi logn 2 



|<£n(*n)| < 



n 



where according to Lemma 6.9, 

<p{t n ) < Cn-i. 
Therefore, over the event Dq n B n , 
(6.39) Wn{t n )\ 2 <Cn~ 2 \ 

Inserting (6.38) and (6.39) into (6.37) gives (6.36), and concludes the proof 
of the theorem. 

APPENDIX 

We shall prove in this section the technical lemmas which are used in the 
proofs of the main results in the previous sections. 
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A.l. Proof of Lemma 6.1. Consider the first claim first. The symmetry 
of w implies 



1 f 1 f' c 

w 1 (u) = — je- ltu w 1 (t)dt = -J o 



cos(tu)wi(t) dt. 



Note that w\ is smooth in (0,oo) and Wy (0) = (— l) fe / 2 7r. Repeatedly 
using integration by parts k times yields 
1 f°° 

cos(tu)wi(t) dt = u~ k + ri(n), u > 0, 



(A.l) 
where 
|n(ii 



7T 



7T \U\' 



t 

cos(tu)w^ (t) dt 



7T\U 



k+1 



t 

sin(tu)w^ <:+1 ^ (t) dt 



Direct calculations show that there is a constant C = C(a,k) > such that 



w 



(fe+i) 



(t)\<C(l + \t\ 



-(a+k+l) 



SO 

(A.2) \n(u)\ < C\u\~ (k+1) . 

Combining (A.l) and (A.2) gives the claim. 

Next, consider the second claim. It is sufficient to show that for sufficiently 
large x, 

(A.3) g(x)>C\x\- k . 
By the way g is defined, 



(A.4) 

where 
1 = 

<\u\>x/2 

First, we have 



g(x) = / <f) a (x)wi(x — u) du = I + II, 



w\(x — u)4>a{u) du, 



II 



\u\<x/2 



W\(x — u)4> a (u) du. 



(A.5) |I| < C<j> a {x/2). 

Second, by the first claim, there are generic constants C2 > C± > such that 
for sufficiently large x and \u\ < x/2, 

Ci\x\~ k < wi(x -u)< C 2 \x\~ k , 

and so 

(A.6) C 1 (l + \x\)~ k <n<c 2 \x\- k . 

Inserting (A.5) and (A.6) into (A.4) gives (A.3). 

Last, consider the third claim. Recall that [i.e., (6.3)] 

fi{x) = (1 - r]n)4>a{x) + Vn^v^TTfiix) + $ar) n g{x). 
Once we take $0 appropriately small, the claim follows from (A.3). 
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A. 2. Proof of Lemma 6.2. Similarly, write 



W2 (u) = — I COs(tu)lV2(u) du = U k + T2(u), U>0, 



7T 

where 



7r ur T /n 

(A.7) 

1 



T„+l ,, 

|t&? +1) (t)|rft. 



7r|u| fc+1 Jo 

Compare (A.7) with the lemma; it is sufficient to show that for sufficiently 
large n, 

-,(*+!), 



(A.8) / \w^\t)\dt<C, 

Jo 

which is equivalent to 

(A.9) r +1 \w { 2 k+1 \t)\dt<C. 



We now show (A.9). To do so, we limit our attention to 2 < \t\ < r n + 1. 
Recall that wi($) = S2(t)w(t), where 

w(t) = e^' 2 Mt) + -J-^— ^(e^ 2 - 1). 

"0 Vn 

First, by the way S n is defined, 

(A.10) \w(t)\ < C[\t\~ a + i 2 r-( Q+2 )] < C\t\~ a . 

Second, fix m = 1, 2, . . . , k + 1, write 

m 1 1 _ 

(A.ll) w^ m \t) = Y J {e Snt2,2 ) {m - ]) w^\t) + J-^i( e ^* 2 /2 ) M_ 

Recall that wi(t) = \t\~ a . By elementary calculus, there is a constant C = 
C(k) > such that 

(A.12) | (e 5nt 2 /2)(m)| < C5nt ^ |^ m )(t)| < 

Combining (A.ll) and (A.12) gives 

\w^{t)\<C5 n \t\ l - a + C-^- 

(A.13) M " 

< C^ltl 1 "" + Cr-( Q+1) , m = 1, 2, . . . , k + 1. 
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Last, direct calculations show that 
(A.14) |4 m) (£)|<C, m = 0,l,...,k. 

Combining (A. 10), (A. 13) and (A.14) gives 

(A.15) \w£ +1 \t)\<C[8 n \t\ 1 - a + T-^ r > + \t\- a ] 

and (A. 9) follows directly. 

A. 3. Proof of Lemma 6.3. The first claim follows by the way that fi is 
constructed. Consider the second claim. Recall that 

A(t) = 77ne- (a2+1) * 2/2 + e- aH2 l 2 [(l - 7] n ) + 0„%t&l(*)]> 

and that 

h(t) = Vne~ ia2+1)t2/2 + e- a2t2 ' 2 [(l - Vn ) + MnMt)}- 
Fix < m < k. On one hand, 

|( e - a2 * 2 / 2 )( m )(i)| < C\t\ m e- a2t2 / 2 . 
On the other hand, by the proof of Lemma 6.2, 

\w ( r\t)\<c, i4 m)(t) i<c. 

Combining these gives the claim. 

A. 4. Proof of Lemma 6.4. Write for short t = t n (j). By elementary cal- 
culus, for any t > 0, 

n n 

(A.i6) \Mt) - ii < - e \ eitXj -ii<-x; i^-i- 

3=1 3=1 

Note that for sufficiently large n, \(p n (t n )\ = n~ 7 < 1/2. Applying (A. 16) 
with t = t n gives 

ft 71 /2 

(A-17) t n > ^ n — r |l-y w (*n)l >^7T 



j=ll A jl 2^=1 I A j I 

Now, first, by direct calculations and the Holder inequality, 
(A-18) „ 
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where in the last step we have used \(p n (t)\=n 1 . Second, note that for any 
t, 



(A.19) 



. e *Wj 



n ^ J 

i=i 3=1 

Combine (A. 17), (A. 18) and (A.19) and use the Cauchy-Schwarz inequality, 
(A.20) \ai& n ,ty<2rf(^J2\X j \j <2n^f>/). 



Hence, to show the claim, it is sufficient to show 



(A.21) 



E 



< C/n. 



We now show (A.21). Recall that 771,2 denotes the second moment of X\, 
we write 

1 

(A.22) 



- V A| = 777 2 + ^, 

77 L — ' J Jn 

7=1 



where z = \A*[n Sj=i -^f ~~ m 2\- It is seen that Ez 2 < C, so by the Holder 
inequality, 



(A.23) 











E 


y^' 1{BS} . 







1/2 

<(^-P{^}) <C/n 



Inserting (A.23) into (A.22) gives (A.21). This concludes the proof. 

A. 5. Proof of Lemma 6.5. Before we show the Lemma 6.5, we need some 
notation and lemmas. Introduce the event 



(A.24) D x = \ Wi(<p n ;n)< 

where 



^(^/(q- 2) log 77 + 2t77 2 ) 



j? 



Wi((f n ;n) = W 1 (ip n ;n,X 1 ,X 2 ,...,X n ) = sup \<p' n (t) - <p'(t)\, 

{\t~t n \<c n">^/ 2 } 

7772 is the second moment of X±, and cq is a constant defined in (6.19). We 
have the following lemmas. 

Lemma A.l. Fix q > 4 and 7 G (0, 1/2). For sufficiently large 77, 
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Lemma A. 2. Fix q > 4 and 7 € (0, 1/2). For sufficiently large n, 
(A.25) - ¥>'(i)| 2 • 1{aad i} ] < C/n. 

Here, d(n a ) denotes a term which equals o(n a+s ) for any 5 > 0. The proof 
of Lemma A.l is similar to that of Lemma 6.4 of Jin and Cai (2006) so 
we skip it. Lemma A. 2 is the tricky part of the proof of Lemma 6.5 and is 
proved in Section A. 5.1. 

We now proceed to prove Lemma 6.5. Write for short t n = i n (7) and 
tn = tn{l)- By triangle inequality, 

EW n (i n )-v'{i n )\ 2 ] 

< E[\<p' n (i n ) - ip'{i n )\ 2 ■ l {DonDl} ] 

(A.26) 

+ £[K(i)-^(i)| 2 -l {D c } ] 

+ E[<p' n (i n )-<p'(t n )\ 2 -l{D \D 1 }]- 
First, recall that over the event Dq [i.e., (6.19)], 

\t n -tn\< c n'~ 1/2 , 
so by the definition of the event D\ , 



yjfa) - ip'(i n )\ < C0ogR/v^ over D D D t 

and 

(A.27) Ey n {t n ) - v'{i n )\ 2 • l {Do nD l} ] < Clog(n)/n. 

Second, note that for all t, 



where mi is the first moment of X\. It follows that 

2 



(A.28) 



E[\<p' n (t n )-ip'(t n )\ 2 -l {D c } ] 



<C\E 



1 n 



n 

1 v^n 



mi, 



■ 1 



+ 2miP{L>^} 



Moreover, note that E[^ Y2^=i(\^j\ ~ m i)] 4 — C/n 2 , by the Holder inequal- 
ity, 

\ 2 



E 



mi, 



■ 1 
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(A.29) < E 



1 n 

-Ed*;l-™i) 



n 



4 v 1/ 



< o(l/n). 
Combining (A. 28) and (A.29) gives 
(A.30) £[K(4) - v'{i n )\ 2 ■ 1{D= } ] < C/n 

and the claim follows by inserting (A. 25), (A. 27) and (A.30) into (A. 26). 

A. 5.1. Proof of Lemma A. 2. We prove it for the case 7 < 1/3 and the 
case 7 > 1/3 separately. 

Consider the case 7 < 1/3 first. By the Taylor expansion, for some £ that 
falls between t n and t n , 

(A.31) <p' n (tn) ~ V'(tn) = Vn(tn) ~ ^'(*n) + (<Pn(0 ~ ¥>"(£)) ' (*n ~ t n ). 

By direct calculations and the definition of Dq, 

1 n 

(A.32) |^(£) - <p"(0\ < - + E \ X *\) < C over A)- 

Also, recall that 

(A.33) \L-tn\ <c n^ 1 / 2 . 

Inserting (A.33) and (A.32) into (A.31) gives 

y n {i n ) - y'{i n )\ < y n {t n ) - <p'(t n )\ + c^- 1 / 2 , 

which implies 

y n {i n ) - v'{i n )\ 2 < c{y n {t n ) - v'(t n )\ 2 + n 2 ^ 1 ). 

It follows that 

E[Wn$n) ~ V{in)\ 2 ■ 1{A,\D 1 }] 

(A.34) 

< C{E[y n {t n ) - ^{t n )\ 2 } + n 2 ^ 1 • P{D \ Dx}). 
By Lemma A.l and elementary statistics, 

(A.35) PjAA^i}^ 7 " 1 ), E[y n (t n )-v'(t n )\ 2 ]<C/n, 
inserting (A.35) into (A.34) gives 

E[\ V ' n (t n ) - <p'{t n )\ 2 ■ l {Do \ Dl} ] = C/n + oin^ 1 ) 
and the claim follows by 7 < 1/3. 
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Next, consider the case 7 > 1/3. Fix 5 £ (7, 2 — 37) and let 
K = K{n,c ,j,5) = Co n 7+ ' 5 / 2 - 1 / 2 . 

Note here that 

3^-1 

7 + 5/2 - 1/2 > -L— > 0. 

Layout a grid s fc = *n + (k - K - l)n~ s / 2 , k = 1,2, . . . , 2K + 1. Observe that 
for any t £ [s fe ,s fe+ i], 

l</4(*) - ¥>'(*)! < l</4( s fc) - 

(A.36) 

+ n" 5 / 2 .( sup 

Combining (A.36) with (A. 32) gives 

\<p' n (t) ~ < l^(*k) " ^(**)| + Cn"^ 2 over D . 

Now, note that the endpoints of the grid are 

t n ± ifn" i/2 = t B ± con 7 " 1 / 2 
and that over the event Dq, 

\i n -t n \< con 1 ' 1 / 2 ; 

it follows that 

Wn$n) ~ V>'(t n )\ < rix max Wnisk) ~ <p' (s k )\ + Cn~ S/2 . 

|l<fc<2A +1} 

Therefore, by the Cauchy-Schwarz inequality, 
(A.37) W n (t n )-V\h)\ 2 <c(( max W n {s k ) - <p'(s k )\ 2 ) + n 

Recall that 

(A.38) P{D \D 1 }<o(n'~ 1 ). 
It follows from (A.37) and (A.38) that 

E[\ip' n (i n )-ip'(t n )\ 2 -l {Do \ Dl} ] 



< C 

(A.39) 



( E [({i<^K + i} M8k) " V? ' (Sfc)|2 ) ' 1{D "^ } 

+ n- 5 P{D \D 1 } 



2K+1 

C E[\<Pn(sk)-<p'(s k )\ 2 -l^n^+oin- 1 ), 

k=l 
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where in the last step we have used 5 > 7. 

Now, for any k = 1, 2, ... , 2K + 1, observe that by elementary statistics, 

EW n (sk)-v'{s k t]<C/n\ 

By the Holder inequality and (A. 38), 

E[y n { Sk )-v'{s k )\ 2 -i {DoXDl} ] 

<(E\cp' n (s k )- V > l (s k t-P{D \D 1 }f 2 
<o(n^" 3 )/ 2 ), 
so by K < Cn^ 5 ' 2 - 1 ' 2 

K 

Y^EWnisk) ~ V\s k )\ 2 • l {AADl} ] < o(Kn^ 2 ) 



k=l 

(A.40) 



- (n 3 7 /2 + «5/2- 2l 



Recall 5 < 2 - 37, it follows from (A.40) that 

K 

(A.41) ]T E[y n {s k ) - y'{s k )\ 2 ■ l {Do \ Dl} ] = o(l/n) 

k=l 

and the claim follows by plugging (A.41) into (A. 39). 

A. 6. Proof of Lemma 6.9. Consider the first claim. Write for short i r 
\/2j log n. By the definition and elementary Fourier analysis, 

0* (t ) = (i _ £n)e ~n^ /^ ( t ^z^ 

V °o 

(A.42) 

-i/2( CT M,) 2 t 2 rn J + u -uo 



+ e n J e -WWr coslt^y^ J h{u\a) dH(a). 
By Lemma 6.6, we have that over the event B n , 

(A.43) \a -a \< Clog 1/2 (ra)ra 7 - 1/2 , \u - u \ < Clog^)™ 7 " 1 / 2 . 
As a result, by the Taylor expansion and that t n = 2 s i n , 

\<p R (t n ) - (f> R {t n )\ < |(^)'(*n)| • \tn ~ tn\ 

(A.44) 

<C\og{n)n^ l l 2 \^ R )'(t n )l 

where 

(A.45) |(^)'(y| < Ct n e- 1/2 ^ 0, ^ )2 ^ < C log 1 / 2 (n)n" 7 . 
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Combining (A. 44) and (A. 45) gives the first claim. 

Consider the second claim. Introduce a bridging quantity 



(A.46) 



cos t 



Xi - up 



By the triangle inequality, 

(A.47) |(i- £n )-^V(yi<i + n, 



where I = |(1 - e n ) - n^[cos(t n ^^)]| and II = n ^\E\zo^t n ^^)\ 



ip R (t n )\. Consider I first. By direct calculations and e* n//2 = n 7 , 

Xi - u s 



(1 - E n ) - tfE 



(A.48) 



cos( t n 

CO 

l/2[(a/a f-l]P n 



t U-Uq . 

cos( t n \n[u\<j) 



dH{a). 



Note that by elementary Fourier analysis 

. .U-Uq 

cos t 



h(u\a) = Re(h( — 
o"0 J \ V°"o 



Since H is eligible and obeys the constraint (2.3), we have 
(1 — e n ) — n 1 E c 



/- X 1 - u 



V CO 



(A.49) 



< Ce n t n a . 



l/2[( CT /ao) 2 -l]il 








V^O 





dH(a) 



Consider II next. It follows from the proof of Theorem 2.2 [i.e., (6.18)] 
that 



(A.50) 



\a Q -a Q \<Ce n \og~^l 2 {n), 
\u -u \<Ce n log^ a+1 ^ 2 (n). 



Compare (A.48) with (A.42), 



<Cn~ 1 



cos t n 



Xi - u 



C() 

-J2 _2ij2 



(1 - e n )(\a - a \t n + \u - u \t n ) 
+ e n (ct 2 ^|cto - ctq| + \ut n \\u - u \)dH(u,a) 



4G 
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Note that E\u\ < C and E\a 2 - a%\ < C, it follows from (A. 50) that 

<Ce n n-^log- a/2 (n). 

Inserting (A. 49) and (A. 51) to (A. 47) gives 

|(1 - e n ) - n^ R (t n )\ < Ce n \og- a l\n). 
This concludes the proof of the second claim of the lemma. 
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